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1. INTRODUCTION 

Image indexing and search systems by content allow us to search for images from a database according 
to their visual characteristics [1]. These characteristics, also called low-level visual features are color, texture, 
shape [2], [3]. The semantic gap between low-level features and high-level semantic understanding of images 
is often hard to bridge [4]. The content based image retrieval (CBIR) systems in the compressed domain use a 
global descriptor [[5J-[8], such as the histogram of the discrete cosine transform (DCT) coefficients [9], or 
discrete wavelet transform (DWT) [I1]-[13], to represent the images. Several methods have been developed. 
We cannot cite all the existing methods, but, we will try to summarize some works that are related to our 
approach. They used the histogram to extract information from the characteristic distribution, without having 
information about the location of its characteristics [14]. These global features are taken from the entire image 
and often are unable to find the local details in natural images. Therefore, non-similar images which vary in 
local detail may have a vector of similar characteristics. This situation creates difficulties for the low level 
feature’s representation to have a solid relationship with the semantics of the image. 

Region-based search systems attempt to fill gaps in content-based search systems [15]-[17]. A region- 
based search system applies segmentation [18]-[20] to images to decompose it into regions, which matches 
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objects if the segmentation is perfect [21]. The object representation is intended to be close to the perception 
of the human visual system (HVS). Since the search system identifies objects that are in the image, it will be 
easier for this system to recognize similar objects in different locations and with different orientations and sizes. 
Region-based search systems include the Netra system, and the Blobworld system. The Netra and 
Blobworld systems compare images region to region. The reason is to transfer part of the comparison task to the 
users. To search for an image, a user has the segmented regions of the image and is asked to select the regions 
which contribute to the search and also the attributes, for example color and texture, of the regions used to assess 
similarity. To measure the similarity between the images, Datta et al. [2], proposed the integrated region 
matching (IRM) algorithm, which allows to match a region of an image to several regions of another image from 
the database. In other words, the mapping of regions between two images is a many-to-many relationship [24]. 
Therefore, the similarity between the two images is defined as the weighted sum of the distances, in feature 
space, between all the regions of the different images. Compared to search systems based on different regions, 
like Blobworld, the IRM approach decreases the impact of imprecise segmentation [20]. IRM incorporates the 
properties of all segmented regions so that the information on an image can be fully utilized. To increase the 
robustness against segmentation errors, IRM compares one region with multiple regions in another image. Each 
region is assigned an importance weight, which corresponds to the importance of the region. There are several 
ways to assign weight to a region. Some assume that all regions are equally important. In IRM, important 
objects in an image tend to occupy larger areas, called a pattern based on percentage of area. Another method 
called adaptive region matching (ARM), which consists in dealing with the similarity measurement problem 
in region-based image retrieval has been proposed in [25]. To decrease the negative influence of interfering 
regions and significant loss of information simultaneously in ARM method, a region importance index (RII) 
is calculated to find the semantic meaningful region (SMR). In addition, the ARM automatically performs the 
SMR-to-image search or the image-to-image search depending on whether it has a dominant region or not. 
On the other hand, the significant region-based image retrieval (SRBIR) model proposed by [26], identifies a 
region of importance in an image using a mechanism based on visual attention and represents this region using 
the color descriptors and the descriptors based on the curvelet transform. Recently, the integrated category 
matching (ICM) was adopted by Meng et al. for similarity measurement combined with a centroid-based 
significance index method to match all merged regions in images according to their importance. For feature 
extraction in ICM method, a regional convolution mapping feature (RCMF) based on the convolutional neural 
networks (CNN) was used. RCMF is further combined with the number and distribution of regions to 
represent the characteristics of merged regions. They employ the VGGNet19 model pretrained on ImageNet, 
which has smaller convolution kernels and a larger network depth. 

In recent years the technology of handling images and video has undergone significant change and 
great majority of content is nowadays handled in compressed form. Lossy compression based on quantized 
block of DCT is a proven, highly efficient technique used in major compression standards (JPEG, MPEG1/2/4, 
H26X, VVC). This technique enables reduction of the size of the content to a small fraction of the original size 
while preserving well its perceptual quality. Since perceptual quality is also of major importance for pattern 
recognition, one can conclude that compression may provide interesting perspective for studying recognition 
problems. In this direction, Defee and Zhong have proposed an approach to unify statistical and structural 
DCT information’s image search. They have proposed a description of structural information into patterns 
by splitting the patterns into square regions. To make it possible to extract the characteristics of the regions 
arbitrarily formed for the region-based image retrieval (RBIR) systems, Liu et al. have used a projections 
onto convex sets (POCS) in the wavelet domain. Recently, we have proposed a new region based image 
retrieval in the DCT domain by using a shape-adaptive DCT [30]. In this paper, we propose to optimize our 
scheme by using an efficient and adapted histogram which discriminates the border blocks patterns from the 
interior blocks patterns. Experimental results on two public Corel-1000 and Caltech-256 datasets show that 
the proposed method (PM) is very efficient for exploring the local similarity than the existing visual similarity 
approaches. 

This paper is organized as follows: the section 2 will detail the principle of the ACs and DCs histogram 
patterns construction from the DCT and SA-DCT coefficients. In section 3, we describe several distances to 
measure the similarity between two histograms. Different criteria for the evaluation of our system will be 
developed in the section 4. One of the problems of an image-by-object search system is how to compare 
between two images, which will be developed in section 5. The detailed description of our RBIR system will 
be presented in the sections 6 and 7. In section 8, we will present the experimental results and analyze them. A 
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conclusion will be drawn in the section 9. 


2. IMAGE DESCRIPTORS IN THE DCT DOMAIN 

Suppose that J is the intensity of the image we want to transform and x = (a, y) are the spatial 
coordinates for the pixels of this image. Suppose {by, ....., ba’ } is the set of adjacent blocks, i.e., the collection 
of pixel coordinates, which divide the image. To allow compact notation, assuming that: 


Ip, = {I (a) : x € bj}. (1) 


Each block 6; is transformed into the frequency domain Ti; = DCT({h,) by the DCT transformation. U = 


(u, v) are the frequency coordinates of the DCT coefficients and J(u) is the DCT transform for an image N x 
N represented by the pixels [(a) such that x,y = 1,...,.N. 


2.1. Shape-adaptive transform 


In our approach, we need a background extraction for foreground detection [31]. After that, to extract 
the feature’s regions, we propose to apply shape-adaptive DCT (SA-DCT) to each segment S of the 
boundary blocks Figure 1(a) of the region of interest and the classical DCT to the interior blocks. The basic 
concept of the SA-DCT is to perform vertical 1-D DCTs on the active pixels first Figure 1(b), and then to 
apply horizontal DCTs to the vertical DCT coefficients with the same frequency index Figure 1(c). The most 
important benefit of SA-DCT is its capability to adapt to arbitrarily-shaped regions; the method falls back to 
standard DCT on rectangular image blocks. 


(a) (b) (c) 


Figure 1. Illustration of SA-DCT for (a) arbitrarily-shaped region, (b) vertical alignment followed by vertical 
1-D DCTs, and (c) horizontal alignment followed by horizontal 1-D DCTs 


Recall that Ip, are DCT-transformed blocks of intensities J,,. Let P;. be n-th the segment S;" after SA- 
DCT. Note that the shape of Pe is different from that of Sp. due to the executed vertical and horizontal shifts, 


but that the number of pixels is unchanged. Also, let in (u) be an SA-DCT coefficient in Pj, at frequency u. To 
construct the AC-pattern as shown in subsection 2.2, we will select at most 9 coefficients in each extrapolated 
segment J;’. Where J; is an extrapolated n-th segment of the SA-DCT-transformed block intensity: 


an i?(u) ifue PP 
Tj, (u) “| (a) he (2) 
Vv otherwise. 


2.2. Histogram of the high frequency components ACs 


In this study, we consider DCT blocks 4x4. This method takes into account 9 AC coefficients 
Figure 2(a) among the 15 coefficients used by [9]. Then, we use the statistical information to build the AC- 
pattern Figures 2(b) and 2(c). Finally, the statistical of 3 groups (horizontal: C,, Co, C3, vertical: C4, Cg, C2 
and diagonal: C’5, C19, C15) is used to build the AC-pattern Figure 2(d). This selection is retained because it 
is able to represent the internal structure of the contents of the block and reduces the complexity of the feature 
vector [30]. To build the histogram Hac of AC-patterns for an image, we calculate the number of appearances 
of this AC-pattern in this image. 
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(a) 
= 2 
S,=C, + C2 4+ Cy 61 = Ya123(Ci — $1) 
Texture-Pattern 
S2=C4+ Cg t+ Cy 62 = Y=4812 (Ci — S2)? 5, | 62 | 63 
S3=Cs5 +Cio + Cis 63 = Si=s,1015 (Ci — $3)? 


(b) (c) (d) 


Figure 2. AC-pattern construction process of (a) three groups of AC coefficients extracted from DCT block, 
(b) sums of each group, (c) sums of squared-differences, and (d) texture-pattern 


2.3. Histogram of the low frequency components DCs 

The DC-pattern describes the global characteristics using the gradient between each block and its 
neighbors (Inter-block). DC-DirecVec [9] is defined and used as a characteristic for the DC-pattern. More 
precisely, DC-pattern is defined as a set of directions having the greatest differences between the DC value 
of the current block and the DC values of neighboring blocks. The absolute values of these differences are 
arranged in descending order and the first 7 directions with the largest differences form the DC-pattern. Also, 
the number of appearances of DC-pattern gives us the histogram Hpc. 


2.4. Feature descriptor 

For each block, the AC-pattern is formed by 9 coefficients and the DC-pattern is built by the DC 
coefficient of the block itself and the difference between this value and its 8 other neighboring DCs. The 
concatenated histogram of the two histograms, H4c and Hpc, represents the index of the dinosaur image 
Figure 3(a) and horse image Figure 3(b) and it is used to do the search. In this context, the descriptor is defined 
as (3): 


H =([(1-—a) x Hac,a x Hoc, (3) 


where a is the weight that controls the impact of the AC-pattern and DC-pattern histograms. This parameter 
can be set to improve search accuracy. 
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Figure 3. Histograms of the AC-patterns and Dc-patterns for the RBIR system for (a) combined AC-DC 
patterns for the dinosaur and (b) combined AC-DC patterns for the horse 
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3. SIMILARITY MEASURE 

The feature vector can be seen as a normalized vector. This allows comparison of statistical informa- 
tion between images by calculating the distance between histograms of feature patterns [34]. In this section, we 
will detail the measures of similarities between the query histogram Hog and the database histograms Hp. This 
similarity compares coefficient to coefficient of the histograms with the same index, i.e., they compare Hg (?) 
and Hp(i) fori, but not Hg(z) and Hp(j) fori £j. 


3.1. Coefficient to coefficient similarity measure 
This measurement category compares the corresponding coefficients of the histograms Hg and Hp. 
The similarity between two histograms is the combination of these coefficient-to-coefficient comparisons. 


3.1.1. Minkowski distance 
The Minkowski distance is defined as (4): 


dt, (Ha, Hp) = (3> rat - Hot) ; (4) 


where p = 1, 2 or oo, this distance is always referred by the Manhattan distance, the Euclidean distance or the 
Chebyshev distance respectively. These three distances are the most used in image search. 


3.1.2. Histogram intersection 
The histogram intersection is defined as (5): 


N 
da(Ha, Hp) = So min( Hg(i), Hp(i)), (5) 


i=1 
Note that this distance is less complex because it consists only of minimums and additions. 


3.1.3. Distance chi-square 
Chi-square distance (v7) is used to compare two sets of data and to determine if they are taken from 
the same distribution function. 


d,2(Hg, H ) =p Ha Hp) (6) 
Q> D) a Z ae ’ 


where 7 represents the component of the descriptor and Hg, Hp represent the different descriptors, N is the 
dimension of the descriptor. Note that in our system, the similarity between the request image and an image in 
the database is estimated by the chi-squared distance (6). 


4. PERFORMANCE EVALUATION 

To evaluate and compare the performance of different indexing and image search systems, an eval- 
uation of their performance is necessary. This allows researchers to fully understand the limitations of their 
algorithms and to compare their results with other objective algorithms. In this section, we discuss some per- 
formance evaluation measures for indexing and image search systems. 


4.1. Precision and recall 

The most commonly used performance measure for CBIR and RBIR is the precision-recall curve. The 
precision measures the capacity of the system to provide a maximum of relevant images on the set of images it 
provides and is defined as the ratio between the number of relevant images on the r first images and the number 
of images (r). The recall corresponds to the ability of a system to find the relevant images from the database 
in relation to a query and it is defined as the ratio between the number of relevant images among the r images 
and the total number of relevant images in the database. The system which gives better precision for the same 
recall is the most efficient system. 
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Precision = : (7) 
r+s 

Recall = 8 

eca aaa (8) 


where r is the number of relevant images retrieved, s is the number of irrelevant images retrieved, t is the 
number of irrelevant images not retrieved from the database. 


4.2. Mean average precision (MAP) 


Images are considered to be similar if the distance (6) between their feature descriptors is less than 
a threshold. The performance of the systems can be evaluated using the precision-recall curves as shown in 
subsection 4.1. MAP is the way to turn the precision-recall graph into a single value. The MAP, for all requests, 
is defined by (9): 


1 
MAP = — AP 
5 S~ AP(q), (9) 
qeQ 


where Q is the set of requests and, AP(q) is defined as (10): 


AP(q) = — ). P(Rn), (10) 


where R,, is the recall after the relevant n*” image is selected. P(R,,) is the precision when the recall is R,,. 


MAP contains, in addition to precision and recall, the position of the relevant image. 


4.3. Average retrieval rate (ARR) 
The retrieval rate (RR) for a query is defined as the percentage of the number of relevant images 
retrieved out of the total number of relevant images in the database, observed in the first K images retrieved: 


r 
r+t 


RR = reminder = (11) 
ARR is defined as the average value of the set of retrieval rates (RR) of the first K images found on each 
request. 


5. SIMILARITY MEASURE IN REGION-BASED SYSTEM 

One of the problems of an image-by-object search system is how to compare between two images, 
that is to say the definition of similarity measure between the images. A simple solution adopted by the early 
systems [22], is the use of the individual similarity measure region to region. To use such a system, the user 
is expected to select one or more regions from the query image to do the search. As discussed in [21], due to 
the uncontrollable nature of images, extracting objects from images automatically and precisely is still beyond 
the state of the art of new computer vision techniques [20], [35]. However, other systems tend to partition 
an object into multiple regions, none of which is representative of the semantic object [17]. Therefore, it is 
often difficult for users to determine which regions should be used for search. To provide users with a simpler 
interface and reduce inaccurate segmentation, image to image similarity measures that combine the properties 
of all regions have been proposed in [21], [22]. These systems only require users to choose the query, and 
therefore free them from enigmatic decisions about regions. For example, the SIMPLYcity system uses 
integrated region matching (IRM) as a measure of similarity which is based on the region percentage to decide 
the region’s importance. By allowing a many to many relationship between regions [24], the approach is robust 
against inaccurate segmentation. In our proposed system, two similarity measures are present: by region and 
by image, depending on whether the search is by region or by complete image. 
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5.1. Image to image similarity measure 

Based on the assumption that any region could be useful when evaluating similarity [21], all regions of 
each image are considered. First of all, it should be noted that the similarity calculation, adopted by our system, 
between a region of a query image and another region from the database image is the y? (6). The calculation 
of similarity between the images is based on the calculation of all the similarities between the regions after 
segmentation and then taking the sum of the minimums between the distances of the regions. The Table | and 
Figure 4 show an example of the process to calculate the similarity between two images. After the construction 
of the Table[I] we calculate the sum of the values of the last row to obtain the similarity between two images. 


Table 1. Similarity between regions 


B/R Foreground Background 
Foreground 0.2 0.7 
Background 0.3 0.5 

textbfMin 0.2 0.5 


atl asta il 


Apore Apack 


3 0.7 
Chi-distance 0.2 0.5 
x=0.7 


ill wllphiti * 


Arore Apack 


———_— 


Feature 
Index 


Image to image: The distance is the sum of minimum distances between the different regions. 


Region to region: The distance is the minimum of the distances between the different regions. 


Figure 4. Similarity measure between images 


5.2. Region to region similarity measure 

The second possibility adopted by our system is the similarity computation region to region. The user 
selects a region from the two possible regions (foreground or background) from the query image to perform 
the search. The region to region similarity measure Figure [4|consists in calculating the similarity between the 
selected region and the regions of the images in the database then taking the minimum of the distances between 
the regions. 


6. RESEARCH METHOD 

In the proposed system Figure [5] after image segmentation [20], there are two types of parameters 
which can be adjusted. The parameters dedicated to indexing the images (regions) as shown in subsection 6. 1 
and the parameters dedicated to the retrieval as shown in subsection 6.2. 


6.1. Indexing parameters 
There are several indexing parameters used in the proposed system. These parameters must be opti- 
mized to improve the performance of the system. 


— QPac and QPpc: Quantization parameters for AC and DC respectively 
If these parameters are small enough, the total number of different AC-patterns and DC-patterns becomes 
larger, this makes the histogram generation process more complicated and time consuming. On the other 
hand, if the parameters Pac and QPpc are also large, then the rightmost AC-patterns and DC-patterns 
coefficients tend towards zero, this decreases the search performance. Therefore, a compromise must be 
found for the choice of these parameters between performance and consumption of time, see subsection 2.2. 
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— y: first directions with the biggest DC-Pattern differences 
This parameter can be adjusted to have better search performance, see subsection 2.3. 


— ACBins and DCBins: Number of AC bins and DC bins needed to build the characteristic histogram 
Histograms are constructed by the ordering of feature occurrences. The resulting histogram has bins ordered 
in descending order. The size of this histogram is a free parameter that can be adjusted to improve search 
performance. Eliminating a few bins that represent irrelevant characteristics improves search performance. 
On the other hand, the elimination of multiple bins result in degraded performance. So you have to choose 
the optimal length which induces better research. 


— NC: The number of AC coefficients entering into the construction of the AC-pattern 
NC represents the number of AC coefficients that form the diagonal, horizontal and vertical groups seen in 
subsection 2.2. This parameter can take the value NC' = 2 or NC =3. 

The Figure[5]illustrates the learning process allowing to optimize the indexing parameters mentioned above. 


Training Image 


| 


Segmentation 


| Foreground and Background 


A set of optimized | J 


parameters : QP ac, QP nc ,y, NC Feature Generation (Block 
a 


transforme, quantization). 


QP ac» QP oc » V> | 
NC, ACBins , J 
DCBins. | 


ACBins , DCBins 


Feature Histogram Formation 


—_ ! 


| Histogram Combination | 


Output of | 
training | 


Database retrieval 


process 


acs Retrieval Results 
Training —-rT 
L 


Figure 5. Learning process optimizing the indexing parameters Q Pac, QPpc, y, NC, ACBins, DC Bins 


6.2. Search parameters 
There are several search parameters used in the proposed system. These parameters must also be 
optimized to improve the performance of the system. 
— 6: a weight that combines the foreground histogram with that of the background. 
The ( parameter can be adjusted to improve the performance of the proposed system. 
— QFore: a weight that combines the AC-pattern histogram with the DC-pattern histogram, for the foreground. 
The represents the combined histogram of the foreground Hyrore; where afore iS a Weight parameter 
that can be adjusted to improve the quality of the search. 


Fore = [(1 = Fore) x HACrore. QFore * AD Crore]s (12) 
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— QBack: a weight that combines the AC-pattern histogram with the DC-pattern histogram, for the back- 
ground. 
The represents the combined histogram of the background H pack; where Back 1S a Weight parameter 
that can be, also, adjusted to improve the quality of the search. 


Heack = [(1— aBack) X HaCpacn: UBack X Hcg, 0x1; (13) 


Demonstrations: The substitution of and in: 


image = [(1 — B) x HFore, B x Heack]; (14) 
gives the following (15): 


image = [(1 — QFore — B+ OForef) x AACrores (QFore — AF ore) ADC pores 
(8 — Popack) © HAC. PO Rock * Hodz. 4 (15) 


Several cases may appear: 

1. If @pore = 1 and agack = 1 => Arun = [(1— 8) X Apc pro2; BH DC ga, |. We will have a search with the 
DC in the foreground and the background. 
If 8 =1> Arun = Apdcsz,.,, 2 search with the background DC. 
If 8 =0=> Hew = Hoc,z,,,.» 2 search with the foreground DC. 

2. If Fore = O and Back = 0 => Arun = [1 — B) x Hacp.,..? HACpac, |. We will have a search with the 
AC in the foreground and the background. 
If 8 =1=> Apwi = Hacg,.,, 2 search with the background AC. 
If 8 =0=> Hew = Hac,,,.. a search with the foreground AC. 


3. Wf Orore = Back = 8 = 5 => Hew = [GA Acris GA Der ceex gO ACh) gq D One | 

The Table|2|shows an example of the combination of the parameters { @ ore, @Back, 2 } when these are binary 
values. From the results found, we notice that there are combinations of { afore, ABack, 2 } which give the 
same H7,,. These are the combinations 0 and 2, 1 and 5, 3 and 7 and finally 4 and 6. 


Table 2. Hy.,., for binary combinations of parameters { A Fore, @Back, 2 } 


Combination afore OBack BP Aru 
0 0 0 0 AAC pore 
1 0 0 1 AACpack 
2 0 1 0  HACrore 
3 0 1 1 Apcgacr 
4 1 0 0 Apcrore 
5 1 0 1 AACpack 
6 1 1 O = ADCrore 
7 1 1 1 ApCpack 


7. HISTOGRAM OPTIMIZATION FOR THE BORDER BLOCKS 

In this section, we present a way to optimize our search and the content of the full histogram. It 
consists in separating the AC-patterns and DC-patterns at the border level of the object. So the construction 
process of the combined histogram by the proposed method is shown in Figure 6. Figure 6(a) presents the 
combined histogram taking into account the parameters { Fore, UBack, 8 } and { Yor, YrF. cB, Yrs } and 
Figure 6(b) presents the Illustrative diagram showing an image composed of two objects, foreground and back- 
ground. The foreground histogram, HFore, is combined from the histogram from the interior (full) blocks, HFF, 
and the histogram from the border (contour) block segments, HCF, from the foreground. While the background 
histogram, HBack, is combined from the histogram from the interior (full) blocks, HFB, and the histogram 
from the border (contour) block segments, HCB, from the background. The overall combined histogram is 
calculated from the foreground, HFore, and background, HBack histograms. The following equations illustrate 
the use of these parameters for the construction of the image descriptor. 
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(SA-DCT) 
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Heut=[ (1-8) X Hrore B X Hpack = thtboblbbh 


(b) 


Figure 6. Process of construction of the combined histogram by the proposed method (a) combined histogram 
taking into account the proposed parameters and (b) illustrative diagram showing an image composed of twos 
objects, foreground and background 


— For the foreground 
(a) Foreground border (contour) blocks 


Hor = [(1— er) xX Hacer, Yor X Hpces]; (16) 


Hor: Combined histogram of patterns (AC-DC) from the segments of the border blocks. 
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Hac. ,: AC-pattern histogram from the segments of the border blocks. 
Hpco,: DC-pattern histogram resulting from the segments of the border blocks. 


ycr: A weight to combine Hop. 


(b) Foreground interior (full) blocks 


Hrrp = ((1— yer) x Hacer, Yer X Apcer|, 


rr: Combined histogram of patterns (AC-DC) from interior (full) blocks. 
Hacy,: AC-pattern histogram from the interior (full) blocks. 
Hpc,,: DC-pattern histogram from interior (full) blocks. 


yrr: A weight to combine Hprr. 


(c) The foreground blocks (Interior + Border) 


Fore = [1 a Fore) x Arr, QFore x Hor, 


— For the background 


(a) Background border (contour) blocks 


Hop = (11-0) X Haces; Yon X Apcesl, 


H¢g: Combined histogram of patterns (AC-DC) from the segments of the border blocks. 


HAcc,: AC-pattern histogram resulting from the segments of the border blocks. 
Hpco,: DC-pattern histogram resulting from the segments of the border blocks. 


ycs: A weight to combine Hog. 


(b) Background interior (full) blocks 


Hrpe =(|(1—-yre) X Hacres, VFB X Hpcrs), 


pp: Combined pattern histogram (AC-DC) from interior blocks. 
Hacp,: AC-pattern histogram from the interior (full) blocks. 
pc,,: DC-pattern histogram from interior (full) blocks. 


yrp: A weight to combine Hrp. 


(c) The background blocks (interior + border) 


Heack = [(1— Back) X Hep, QBack X Hos], 


— For the full image 


The global descriptor can be obtained by combining the foreground and background histograms. 


image = (1 = B) x Fore, B x Apack]- 


(17) 


(18) 


(19) 


(20) 


(21) 


(22) 


The 


Figures [7{a) and [7{b) represent the descriptors of the flower and horse images respectively, calculated by the 
method mentioned above. 
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Figure 7. Combined histograms of the first high frequency occurrences of AC-Patterns and DC-Patterns for 
RBIR principle of optimization: (a) global descriptor of the flower image and (b) global descriptor of horse 
image 


8. RESULT AND DISCUSSION 
8.1. Standard benchmark data-sets 

To show the efficiency of the proposed systems, several tests were carried out on two data-sets of 
image: Corel-1000 and Caltech-256. The Corel-1000 database was collected by Wang et al. [36], contains 
1000 256 x 384 or 384 x 256 images, these images are classified into 10 semantic categories: Africans, build- 
ings, beach, buses, dinosaurs, elephant, flowers, horse, mountains and foods. The Caltech-256 database was 
collected by Griffin er al. [37], contains 30,607 images from 256 categories, 80 to 827 images per category. 
In our experiments, we randomly selected 10 categories from Caltech-256, which contain 1,299 images. The 
10 categories are: AK-47, American-flags, backpacks, baseball-bats, baseball-gloves, basketball hoops, bats, 
bathtub, beer-mug and blimp. 


8.2. The average precision-recall results 
Figure[8[a) presents the improvement of the average precision-recall results, on the Corel- 1000 database, 

for the proposed approach (SA-DCT + histogram optimization) over the region-based (SA-DCT) and the 
conventional content-based approach (DCT) [9] approaches. The same curves, for Caltech-256, are shown in 
Figure [8{b). The best performance is obtained by using the global combined and optimized descriptor 
of the foreground and the background together. The proposed system attempts to overcome the limitation of 
global-based retrieval (CBIR/RBIR with DCT/SA-DCT) systems by emphasizing the target objects only and 
minimizing the influence of background. 


8.3. The mean average precision results (MAP) 

We have made a comparison between our proposed method (PM) and many benchmark existing RBIR 
methods like: regional convolution mapping feature with integrated category matching (RCMF+ICM) [17], 
adaptive region matching (ARM) [25], SIMPLIcity [21], integrated region matching (IRM) [25], MN-MIN 
(23), Shape-adaptive-DCT [30]. We also compare our approach with CBIR approaches like DCT+SVD 
and DCT [9]. The specific settings of each method can be found in the related references. 

From Tables |3| we can see that our Proposed Method (PM in bold) performs better than all others 
8 methods on 10 categories of Corel-1000 Figure 9a) and Caltech-256 Figure |9{b) databases. Our approach 
outperforms all the others methods (RBIR and CBIR). According to the Table ae notice that in general the 
proposed RBIR system works better than other existing RBIR systems, except for four classes from the Corel- 
1000 database, this is due to poor segmentation of images from these classes which have not a clear semantic 
object also. It can be concluded from the experimental results that the proposed system improves performance 
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on Corel-1000 and Caltech-256 databases. In addition, a minimum number of AC coefficients and a small 
number of histogram coefficients were used to reduce computation time. 
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(a) (b) 
Figure 8. The average precision-recall results: (a) Corel-1000 database and (b) Caltech-256 database, between 
the proposed method (PM), the region-based approach and the content-based approach [9] 


Table 3. The mean average precision (MAP) comparison between different methods on the Corel-1000 


database 
Category African Beach Buildings Buses Dinosaurs Elephants Flowers Horses Mountains Foods Average 
PM 79.45 65.36 78.94 99.22 100 92.60 97.17 96.28 63.87 88.99 86.19 
SA-DCT 75.45 52.50 76.30 97.25 100 90.20 95.50 94.25 61.85 78.95 82.18 
RCMF + 66.3 22.1 70.6 99.1 100 90.8 99.2 95.1 71.5 85.3 83.06 
ICM 
DCT + 88.98 74.85 75.92 78.75 99.90 91.89 83.78 95.65 73.98 78.97 84.27 
SVD 
MN- 74.23 46.35 74.51 80.60 99.80 62.42 94.84 92.91 61.80 78.88 76.63 
ARM 
MN- 73.80 42.86 74.32 74.24 99.80 61.00 95.23 93.49 51.33 66.54 73.26 
IRM 
SIMPLI 52.73 25.52 56.54 49.40 98.50 45.51 78.73 78.28 39.00 39.13 56.33 
city 
MN-MIN 43.40 29.43 31.82 46.63 90.11 33.07 64.94 63.62 27.90 35.21 46.61 
DCT [9} 52.45 27.20 27.10 36.95 93.05 46.90 53.45 61.65 17.10 37.4 45.33 


8.4. Semantic retrieval 

As stated previously, the proposed system allows a similarity measurement region to region or image 
to image according to the content of the query image. This helps reduce the impact of negative interference 
regions and the loss of important information. In what follows, we will evaluate the system with optimized 
global parameters by considering the previous aspects. 


8.4.1. Decreasing negative influence of interference regions 

Since the image to image similarity measure takes into account the properties of all the regions, 
this requires interfering regions. However, the region to region similarity measure can solve this problem. 
Considering the beach and building categories of the Corel-1000 database which have interfering regions. 
The Figures [10[a) and [10[b) give a comparison between the results of our system with optimized parameters 
depending on whether the search is region to region or image to image on the beach and building categories 
respectively. The comparison in the Figure [10] shows that by searching for the most similar images (average 
retrieval rate P(N), (Ip), the similarity measure region to region have a consistent superiority over image to 
image. 
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Figure 9. Mean average precision (MAP) for (a) Corel-1000 database images and (b) Caltech-256 database 
images 
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Figure 10. The average retrieval rate P(N) of the region-to-region and image-to-image search: (a) beach and 
(b) building from Corel-1000 database 
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8.4.2. The negative influence of important information loss 

For images without a main region, if we adopt the similarity measure region to region to search 
for relevant images, this will result in unsatisfactory search results due to the significant loss of information. 
Considering the African and food categories from the Corel-1000 database which are images without main 
object, we will evaluate our system on both categories. In the same vein, the Figures [ifa) and [li[b) give a 
comparison between the results of our system with optimized parameters depending on whether the search is 
region to region or image to image on the African and food categories respectively. The comparison results are 
denoted by: African (region-to-region), African (image-to-image), food (region-to-region), and food (image- 
to-image). The comparison in the Figure[I1]shows that by finding the most similar images (average retrieval 
rate P(N), (It), the similarity measure image to image has considerable superiority over region to region. 


8.5. Some retrieval examples 

Moreover, the retrieval accuracy for Corel-1000 and Caltech-256 dataset are shown for different cate- 
gories of images as illustrated in Figures [12[a) and [12[b) (see in appendix). For these two examples, we have 
proved that we can remove the irrelevant images by using our optimization histogram approach. For these 
two examples, we have proved that we can remove the irrelevant images by using our optimization histogram 
approach. 
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Figure 11. The average precision P(N) of region-to-region and image-to-image search: (a) African and 
(b) food, from Corel-1000 database 


9. CONCLUSION 

Firstly, the concept of RBIR was introduced. Second, the principle of the proposed method based on 
SA-DCT has been detailed. Finally, we mentioned the experimental results. The proposed method is based 
on the DCT + SA-DCT coefficients histogram depending on whether the parameters { Qfore, UBack, 2 } are 
optimized. An optimization method has been proposed to improve search performance. It consists in separating 
the histogram of the object (foreground and background), into a histogram resulting from the patterns (AC and 
DC) of the interior blocks and a histogram resulting from the patterns (AC and DC) of the border blocks 
segments by using other parameters { yor, Yrr, YcB, VFB }- The proposed system was able to correct the 
defects of the classic system (CBIR with DCT only). Two types of similarity measure are adopted by our RBIR 
system, image to image and region to region. The experimental results show that these two measures make 
it possible to reduce the negative influence of the interfering regions and reduce the harmful influence of the 
significant loss of information. Experimental results on Corel-1000 and Caltech-256 databases show that the 
proposed method is more efficient than some well established RBIR methods. Like a perspective, we’ ll use the 
deep learning (CNN) to optimize, at the first stage: a) the indexation parameters: Pac, QPpc, y, ACBins, 
DCBins and NC; at the second stage: b) the search parameters: { Qrore, @Back, 2 }, and the histogram 
optimization parameters: { Ycr, rr. YCB, YFB }- 
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Figure 12. Example of image retrieval with histogram optimization on Corel-1000 and the Caltech-256 


databases. The first image is the query image, the 19 other images are the retrieved images with the proposed 


method (PM) for (a) horse image and (b) image AK-47 
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